Skip to content

fix(login): prevent env var auth leak in guest sessions#267

Merged
deanq merged 12 commits intomainfrom
deanq/ae-2442-fix-flash-login
Mar 12, 2026
Merged

fix(login): prevent env var auth leak in guest sessions#267
deanq merged 12 commits intomainfrom
deanq/ae-2442-fix-flash-login

Conversation

@deanq
Copy link
Member

@deanq deanq commented Mar 12, 2026

Summary

  • flash login hangs at "Waiting for authorization..." when RUNPOD_API_KEY is set (env var or credentials file from a previous login)
  • Root cause: api_key_override or get_api_key() in all three HTTP session factories conflated "no override provided" with "explicitly no auth" — the or fallback picked up existing credentials and sent a Bearer token, so the server never saw a guest caller
  • Introduces _UNSET sentinel default so api_key_override=None (login flow) means "send no auth" while omitting the parameter preserves existing credential resolution

Test plan

  • New test: api_key_override=None with RUNPOD_API_KEY set — all 3 session factories omit Authorization header
  • New test: require_api_key=False with env var set — GraphQL client session has no auth (exact bug scenario)
  • Existing tests continue to pass (default credential resolution unchanged)
  • make quality-check passes (85% coverage)
  • Manual: set RUNPOD_API_KEY, run flash login --force, verify it completes
  • Manual: unset all keys, run flash login, verify create request succeeds

deanq added 8 commits March 11, 2026 17:08
…t sessions

Session factories conflated "no override provided" with "explicitly no auth"
via `api_key_override or get_api_key()`. When RunpodGraphQLClient set
api_key_override=None for require_api_key=False, the `or` fallback picked up
RUNPOD_API_KEY from the environment, causing flash login to send a Bearer token
instead of acting as a guest. The server then never returned an apiKey.

Introduces _UNSET sentinel so api_key_override=None means "send no auth" while
omitting the parameter preserves existing credential resolution.
…uth scenarios

- Replace mocked get_api_key test with real credential resolution (fresh user)
- Add test: credentials file has key + require_api_key=False → no auth header
- Add test: both env var and credentials file + require_api_key=False → no auth header
Remove _read_credentials references (no longer exists), RUNPOD_CREDENTIALS_FILE
env var usage, and clear=True from patch.dict calls. All tests now use the
autouse isolate_credentials_file fixture and write runpod-python profile format
([default] section). Also fix credentials.py to read CREDENTIAL_FILE dynamically
via module reference (enables monkeypatch) and catch TOMLDecodeError from
get_credentials() for corrupt file handling.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes flash login guest-session hangs when RUNPOD_API_KEY (env var or stored credentials) is present by ensuring login/guest flows can explicitly disable auth header injection, while preserving default credential resolution for authenticated flows. Also adds migration support for legacy ~/.config/runpod/credentials.toml into the current runpod-python credential location.

Changes:

  • Introduce an _UNSET sentinel for HTTP session factories so api_key_override=None means “no auth”, while omitting the arg still resolves credentials via get_api_key().
  • Rework credential handling to wrap runpod-python credential helpers and add legacy XDG credentials migration.
  • Update/expand unit tests and add a shared autouse fixture to isolate credential files during tests.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/runpod_flash/core/utils/http.py Adds _UNSET sentinel behavior to prevent accidental credential fallback when api_key_override=None.
src/runpod_flash/core/credentials.py Switches to runpod-python credential APIs and adds legacy XDG migration helper.
src/runpod_flash/cli/commands/login.py Calls legacy credential migration during login flow before polling for approval.
tests/conftest.py Updates isolate_credentials_file fixture to monkeypatch runpod-python credential file path and clear RUNPOD_API_KEY.
tests/unit/test_login_extended.py Expands login and GraphQL session tests to cover “no-auth despite env/file creds” scenarios.
tests/unit/test_login.py Updates login-related tests to use the new credential isolation fixture.
tests/unit/test_credentials_extended.py Updates credential/http helper tests for the new config.toml format and explicit None override behavior.
tests/unit/test_credentials.py Updates core credential tests for config.toml format and new isolation approach.
tests/unit/test_credential_migration.py Adds coverage for migrating legacy XDG credentials to the new location.
Comments suppressed due to low confidence (1)

src/runpod_flash/core/utils/http.py:128

  • The docstrings still describe api_key_override as an optional API key that falls back to RUNPOD_API_KEY when not provided, but the new semantics differentiate “parameter omitted” vs “explicitly None” to disable auth. Please update the Args: docs (and the aiohttp bullet list) to explicitly state: omitted => resolve via get_api_key(), None => send no auth header.
    Args:
        timeout: Request timeout in seconds. Defaults to 30.0.
        api_key_override: Optional API key to use instead of RUNPOD_API_KEY env var.
                         Used for propagating API keys from load-balanced to worker endpoints.

    Returns:
        Configured httpx.AsyncClient with User-Agent and Authorization headers

    Example:
        async with get_authenticated_httpx_client() as client:
            response = await client.post(url, json=data)

        # With custom timeout
        async with get_authenticated_httpx_client(timeout=60.0) as client:
            response = await client.get(url)

        # With API key override (for propagation)
        async with get_authenticated_httpx_client(api_key_override=context_key) as client:
            response = await client.post(url, json=data)
    """
    from .user_agent import get_user_agent

    headers = {
        "User-Agent": get_user_agent(),
        "Content-Type": "application/json",
    }
    api_key = get_api_key() if api_key_override is _UNSET else api_key_override
    if api_key:
        headers["Authorization"] = f"Bearer {api_key}"

    timeout_config = timeout if timeout is not None else 30.0
    return httpx.AsyncClient(timeout=timeout_config, headers=headers)


def get_authenticated_requests_session(
    api_key_override: Union[Optional[str], object] = _UNSET,
) -> requests.Session:
    """Create requests Session with RunPod authentication and User-Agent.

    Automatically includes:
    - User-Agent header identifying flash client and version
    - Content-Type: application/json
    - Authorization header if an api key is available

    Provides a centralized place to manage authentication headers for
    synchronous RunPod HTTP requests.

    Args:
        api_key_override: Optional API key to use instead of RUNPOD_API_KEY env var.
                         Used for propagating API keys from load-balanced to worker endpoints.

    Returns:
        Configured requests.Session with User-Agent and Authorization headers

    Example:
        session = get_authenticated_requests_session()
        response = session.post(url, json=data, timeout=30.0)
        # Remember to close: session.close()

        # Or use as context manager
        import contextlib
        with contextlib.closing(get_authenticated_requests_session()) as session:
            response = session.post(url, json=data)

        # With API key override (for propagation)
        with contextlib.closing(get_authenticated_requests_session(api_key_override=context_key)) as session:
            response = session.post(url, json=data)
    """
    from .user_agent import get_user_agent

    session = requests.Session()
    session.headers["User-Agent"] = get_user_agent()
    session.headers["Content-Type"] = "application/json"

    api_key = get_api_key() if api_key_override is _UNSET else api_key_override
    if api_key:
        session.headers["Authorization"] = f"Bearer {api_key}"

    return session


def get_authenticated_aiohttp_session(
    timeout: float = 300.0,
    api_key_override: Union[Optional[str], object] = _UNSET,
    use_threaded_resolver: bool = True,
) -> ClientSession:
    """Create aiohttp ClientSession with RunPod authentication and User-Agent.

    Automatically includes:
    - User-Agent header identifying flash client and version
    - Content-Type: application/json
    - Authorization header if RUNPOD_API_KEY is set or api_key_override provided
    - 5-minute default timeout (configurable)
    - TCPConnector with ThreadedResolver for DNS resolution (configurable)

    Args:
        timeout: Total timeout in seconds (default: 300s for GraphQL operations)
        api_key_override: Optional API key to use instead of RUNPOD_API_KEY.
                         Used for propagating API keys from mothership to worker endpoints.
        use_threaded_resolver: Whether to use TCPConnector with ThreadedResolver.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

deanq added 3 commits March 12, 2026 10:10
Replace .env-based API key setup with flash login as the primary
authentication method across README, skeleton template, init command
output, and CLI docs.

- Quick Start sections now use flash login instead of cp .env.example
- Added Authentication section to skeleton template README
- Simplified init command post-creation steps
- Added flash-login.md CLI documentation
- Updated test_init to verify flash login in steps table
- Add _resolve_api_key() with TypeError guard for invalid override types
- Wrap get_credentials() in try/except during migration pre-check
- Validate existing api_key with isinstance + .strip() before skipping
- Fix docstring to match non-interactive auto-migration behavior
- Move migration call to after successful auth (not before polling)
- Simplify _OLD_XDG_PATHS tuple to _OLD_XDG_PATH constant
- Narrow migration except clause from Exception to (OSError, ValueError)
- Replace Rich Table _cells access with console render-to-string
- Update migration test patches from _OLD_XDG_PATHS tuple to _OLD_XDG_PATH
@deanq deanq requested review from KAJdev and jhcipar March 12, 2026 17:40
@deanq deanq merged commit 6316289 into main Mar 12, 2026
4 checks passed
@deanq deanq deleted the deanq/ae-2442-fix-flash-login branch March 12, 2026 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants